76 research outputs found
Domain adaptation for sequence labeling using hidden Markov models
Most natural language processing systems based on machine learning are not
robust to domain shift. For example, a state-of-the-art syntactic dependency
parser trained on Wall Street Journal sentences has an absolute drop in
performance of more than ten points when tested on textual data from the Web.
An efficient solution to make these methods more robust to domain shift is to
first learn a word representation using large amounts of unlabeled data from
both domains, and then use this representation as features in a supervised
learning algorithm. In this paper, we propose to use hidden Markov models to
learn word representations for part-of-speech tagging. In particular, we study
the influence of using data from the source, the target or both domains to
learn the representation and the different ways to represent words using an
HMM.Comment: New Directions in Transfer and Multi-Task: Learning Across Domains
and Tasks (NIPS Workshop) (2013
A convex relaxation for weakly supervised relation extraction
International audienceA promising approach to relation extraction, called weak or distant supervision, exploits an existing database of facts as training data, by aligning it to an unlabeled collection of text documents. Using this approach, the task of relation extraction can easily be scaled to hundreds of different relationships. However, distant supervision leads to a challenging multiple instance, multiple label learning problem. Most of the proposed solutions to this problem are based on non-convex formulations, and are thus prone to local minima. In this article, we propose a new approach to the problem of weakly supervised relation extraction, based on discriminative clustering and leading to a convex formulation. We demonstrate that our approach outperforms state-of-the-art methods on the challenging dataset introduced by Riedel et al. (2012)
Bag of Tricks for Efficient Text Classification
This paper explores a simple and efficient baseline for text classification.
Our experiments show that our fast text classifier fastText is often on par
with deep learning classifiers in terms of accuracy, and many orders of
magnitude faster for training and evaluation. We can train fastText on more
than one billion words in less than ten minutes using a standard multicore~CPU,
and classify half a million sentences among~312K classes in less than a minute
Adaptive Attention Span in Transformers
We propose a novel self-attention mechanism that can learn its optimal
attention span. This allows us to extend significantly the maximum context size
used in Transformer, while maintaining control over their memory footprint and
computational time. We show the effectiveness of our approach on the task of
character level language modeling, where we achieve state-of-the-art
performances on text8 and enwiki8 by using a maximum context of 8k characters.Comment: Accepted to ACL 201
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Generative models for open domain question answering have proven to be
competitive, without resorting to external knowledge. While promising, this
approach requires to use models with billions of parameters, which are
expensive to train and query. In this paper, we investigate how much these
models can benefit from retrieving text passages, potentially containing
evidence. We obtain state-of-the-art results on the Natural Questions and
TriviaQA open benchmarks. Interestingly, we observe that the performance of
this method significantly improves when increasing the number of retrieved
passages. This is evidence that generative models are good at aggregating and
combining evidence from multiple passages
Distilling Knowledge from Reader to Retriever for Question Answering
The task of information retrieval is an important component of many natural
language processing systems, such as open domain question answering. While
traditional methods were based on hand-crafted features, continuous
representations based on neural networks recently obtained competitive results.
A challenge of using such methods is to obtain supervised data to train the
retriever model, corresponding to pairs of query and support documents. In this
paper, we propose a technique to learn retriever models for downstream tasks,
inspired by knowledge distillation, and which does not require annotated pairs
of query and documents. Our approach leverages attention scores of a reader
model, used to solve the task based on retrieved documents, to obtain synthetic
labels for the retriever. We evaluate our method on question answering,
obtaining state-of-the-art results
- …